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Abstract 

We extend the Effective Fragment Molecular Orbital (EFMO) method to the frozen domain 
approach where only the geometry of an active part is optimized, while the many-body polarization 
effects are considered for the whole system. The new approach efficiently mapped out the entire 
reaction path of chorismate mutase in less than four days using 80 cores on 20 nodes, where the 
whole system containing 2398 atoms is treated in the ab initio fashion without using any force fields. 
The reaction path is constructed automatically with the only assumption of defining the reaction 
coordinate a priori. We determine the reaction barrier of chorismate mutase to be 18.3 ± 3.5 kcal 
mor^ for MP2/cc-pVDZ and 19.3 ± 3.6 for MP2/cc-pVTZ in an ONIOM approach using EFMO- 
RHF/6-31G(d) for the high and low layers, respectively. 

Introduction 

Fragment-based quantum mcchanieal methods [TMl2) arc becoming increasingly popular |13j . and have 
been used to describe a very diverse set of molecular properties for large systems. Although these 
methods have been applied to refine the energetics of some enzymatic reactions |141I15| they are usually 
not efficient enough to allow for many hundreds of single point calculations needed to map out a reaction 
path for a system containing thousands of atoms, although geometry optimizations of large systems 
can be performed for systems consisting of several hundreds of atoms [51I^ [TT1[T5HT5] . In fact, typically 
apphcations of fragment-based methods to biochemical systems, for example, to protein-ligand binding 
[19| . are based on performing a few single point calculations for structures obtained at a lower level of 
theory (such as with force fields). Although many force fields are well tuned to treat typical proteins, for 
ligands they can be problematic. 
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In this work we extend the effective fragment molecular orbital (EFMO) method [2011^ into the frozen 
domain (FD) formalism [TS], originally developed for the fragment molecular orbital (FMO) method 
For FMO, there is also partial energy gradient method |26| . 

EFMO is based on dividing a large molecular system into fragments and performing ab initio calcu- 
lations of fragments and their pairs, and combining their energies in the energy of the whole system (see 
more below). In the FD approach we employ here, one defines an active region associated with the active 
site, and the cost of a geometry optimization is then essentially given by the cost associated with the 
active region. 

However, unlike the quantum-mechanical / molecular mechanical (QM/MM) method [27] with non- 
polarizable force fields, the polarization of the whole system is accounted for in FMO and EFMO methods: 
in the former via the explicit polarizing potential and in the latter via fragment polarizabilities. Another 
important difference between EFMO and QM/MM is that the former docs not involve force fields, and 
the need to elaborately determine parameters for ligands does not exist in EFMO. Also, in EFMO all 
fragments are treated with quantum mechanics, and the problem of the active site size |28j does not arise. 

The paper is organized as follows: First, we derive the EFMO energy and gradient expressions for 
the frozen domain approach, when some part of the system is frozen during the geometry optimization. 
Secondly, we predict the reaction barrier of barrier of the conversion of chorismate to prephenate (Figure[T|) 
in chorismate mutase. The reaction has been studied previously using conventional QM/MM techniques 
PSI - HO] . The EFMO method is similar in spirit to QM/MM in using a cheap model for the less important 
part of the system and the mapping is accomplished with a reasonable amount of computational resources 
(four days per reaction path using 80 CPU cores). Finally we summarize our results and discuss future 
directions. 

Background and Theory 

The EFMO energy of a system of N fragments (monomers) is 

N fl/,J<flrc=dim -R./,J>flrc=dim 

i?™ = E^?+ E {^Ei^Efp^)+ E ^FI+<o?" (1) 
/ IJ IJ 
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where E'j is the gas phase energy of monomer /. The second sum in equation [T] is the pairwise correction 
to the monomer energy and only appHes for pairs of fragments (dimers) separated by an interfragment 
distance Rj j (defined previously [20]) less than a threshold i?rosdim- The correction for dimer IJ is 

AE'jj^E'^jj-E'j-E'j. (2) 

Eff^ and Ef^^ are the classical pair polarization energy of dimer IJ and the classical total polarization 
energy, respectively. Both energies are evaluated using the induced dipole model [411142] based on dis- 
tributed polarizabilities [33] . The final sum over Eff is the classical electrostatic interaction energy and 
applies only to dimers separated by a distance greater than i?rosdim- These energies are evaluated using 
atom-centered multipole moments through quadrupoles j44) . The multipole moments and distributed 
polarizabilities arc computed on the fly for each fragment [^121) . 

In cases where only part of a molecular system is to be optimized by minimizing the energy, equation[T] 
can be rewritten, resulting in a method conceptually overlapping with QM/MM in using a cheap model 
for the less important part of the system. Consider a system S (Figure ^ where we wish to optimize the 
positions of atoms in region A, while keeping the atoms in region b and F frozen (the difference between 
b and F will be discussed below). With this definition, we rewrite the EFMO energy as 

E - l^p + F^ + Fj^ + bp^^ + bp/j^ + bj^n^ + i^tot , {6) 

where E\ is the internal energy of region A. Region A is made of fragments contaning atoms whose 
position is optimized, and A can also have some frozen atoms 

AT Rl,.J<Rraad 

leA I,Ji£A 

Similarly, E^ is the internal energy of b 

N Rl..J<Rraadi 

Region A is surrounded by a buffer b, because fragment pairs computed with QM containing one fragment 
outside of A (i.e., in b) can still contribute to the total energy gradient (see below). On the other hand. 



-R/.J>-Rrosdim 

{AElj-Ef?^)+ E Efj- 



(4) 



{AEh - EfO^) + 



fll,J>-Rrcsdi„ 



i,.Jeb 



E 



ES 
IJ ' 



(5) 
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fragment pairs with one fragment in F can also contribute to the total gradient, but they are computed 
using a simple classical expression rather than with QM. Note that the relation between the notation 
used in FMO/FD and that we use here is as follows: A,F and S arc the same. The buffer region B 
includes A, but b does not, i.e., A and b share no atoms. Formally, A and b are always treated at the 
same level of theory by assigning fragments to the same layer. 

In the EFMO method, covalent bonds between fragments are not cut. Instead, electrons from a bond 
connecting two fragments are placed entirely to one of the fragments. The electrons of the fragments 
are kept in place by using frozen orbitals across the bond. [21]|45l|46] Fragments connected by a covalent 
bond share atoms (Figure [Sj through the bonding region so it is possible that one side changes the wave 
function of the bonding region |21| . It is therefore necessary to re-evaluate the internal ab initio energy 
of region b for each new geometry step. 

The internal geometries of fragments in region F are completely frozen so the internal energy is 
constant and is therefore neglected 

= 0. (6) 

However, it is still necessary to compute the multipole moments and polarizability tensors (and therefore 
the wave function) of the fragments in F once at the beginning of a geometry-optimization to evaluate 
Ef^^ in equation [3] as well as some inter-region interaction energies defined as 

Rl.J<Rrosdin, fl/,J>-Rro,dim 

leb leb 

E°Fft = 0. (9) 

Equation [5] assumes that 6 is chosen so that fragments in A and F are sufficiently separated (i.e., > 
^rosdim) SO thc interaction is evaluated classically. If all atoms in region b are frozen, then Ep^^^ is constant 
and can be neglected. However, this assumes that the positoins of all atoms at both sides of the bonds 
connecting fragments are frozen. 
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The final expression for tlie EFMO frozen domain (EFMO/FD) energy is 

^EFMO ^ eO ^ ^0 + ^0^^ ^ ^ ^^^^^^ ^^^^ 

Finally, we note that due to the frozen geometry of b we can further gain a speedup by not evaluating 
dimers in h (cross terms between A and h are handled explicitly according to equation [7]) since they 
do not contribute to the energy or gradient of A. This corresponds to the frozen domain with dimers 
(EFMO/FDD), and equation [5] becomes 

Ei=j:^Ei (11) 



The gradient of each region is 

dxA dxA dxA dxA dxA ' 

^^EFMO 



(12) 



dxb 

^^EFMO 

dxp 



0, (13) 
0, (14) 



and the details of their evaluation has been discussed previously [20j[2T] . Equation [13] does not apply to 
non-frozen atoms shared with region A. 

The frozen domain formulation of EFMO was implemented in GAMESS [47] and parallelized using 
the generalized distributed data interface [351112] ■ 



Computational Details 

Preparation of the Enzyme Model 

We followed the strategy by Claeyssens et al. [30] The structure of chorismatc mutase (PDB: 2CHT) 
solved by Chook et al. [SD] was used as a starting point. Chains A, B and C were extracted using 
PyMOL [SI] and subsequently protonated with PDB2PQR [52l|53] and PROPKA [54] at pH = 7. The 
protonation state of all residues can be found in Table SI. The inhibitor between chain A and C was 
replaced with chorismatc in the reactant state (1, Figure |T]) modeled in Avogadro [551(56] . 
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The entire complex (chorismatc niutasc and chorismatc) was solvated in water (TIP3P |57p using 
GROMACS. j551[5U] To neutralize the system 11 Na+ counter ions were added. The protein and counter 
ions were treated with the CHARMM27 [201111] force field in GROMACS. Force-field parameters for 
chorismatc were generated using the SwissParam [62] tool. To equilibrate the complex a 100 ps NVT 
run at T 300 K was followed by a 100 ps NPT run at P = 1 bar and T = 300 K. The production 
run was an isothcrmal-isobaric trajectory run for 10 ns. A single conformation was randomly selected 
from the last half of the simulation and energy minimized in GROMACS to a force threshold of Fmax = 
300KJmol^^ nm^^. During equilibration and final molecular dynamics (MD) simulation, the C3 and C4 
atoms of chorismate (see Figure [1]) were harmonically constrained to a distance of 3.3 A to keep it in the 
reactant state. Finally, a sphere of 16 A around the CI atom of chorismate was extracted in PyMOL 
and hydrogens were added to correct the valency where the backbone was cut. The final model contains 
a total of 2398 atoms. 

Mapping the Reaction Path 

To map out the reaction path, we define the reaction coordinate similarly to Claeyssens et al. |40| as the 
difference in bond length between the breaking 02-Cl bond and the forming C4-C3 bond in chorismate 
(see also Figure [T]), i.e. 

i? = i?21 -i?43. (15) 

The conversion of chorismate {R = —2.0 A, R21 = 1.4 A, i?43 = —3.4 A) to prephcnate (R = 1.9 
A, i?2i = 3.3 A, i?43 = 1.4 A) in the enzyme was mapped by constraining the two bond lengths in 
equation ITS] with a harmonic force constant of 500 kcal mol^^ A^^ in steps of 0.1 A. For each step, all 
atoms in the active region {A) were minimized to a threshold on the gradient of 5.0 • 10^** Hartree Bohr^^ 
(OPTTOL=5.0c-4 in SSTATPT). For the enzyme calculations we used EFMO-RHF and FM02-RHF 
with the frozen domain approximation presented above. 

We used two different sizes for the active region small: (EFMO:S, Figure [4]) and large (EFMO:L, 
Figure [5]). The active region (colored red in Figure [2]) is defined as all fragments with a minimum 
distance inactive from any atom in chorismate (EFMO:S : i?activc = 2.0 A, EFMO:L : i?activo ~ 3.0 A). 
In EFMO:S the active region consists of chorismate, 4 residues and 5 water molecules, while the active 
region in EFMO:L consists of chorismate, 11 residues and 4 water molecules. The buffer region (blue in 
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Figure [5]) is defined as all fragments within 2.5 A of the active region for both EFMO:S and EFMO:L. 
The rest of the system is frozen. To prepare the input files we used Fragit [63j . which automatically 
divides the system into fragments; in this work we used the fragment size of one amino acid residue or 
water molecule per fragment. 

In order to refine the energetics, for each minimized step on the reaction path we performed two-layer 
ONIOM [MlllS] calculations 

Rihigh ^ Tallow , jTihigh 771I0W /i f.\ 

-'^roal ~ -^rcal + -C/modcl ^modcb UDj 

where = £;EFM0 according to equation O This can be considered a special case of the more general 
multicenter ONIOM based on FMO [BB], using EFMO instead of FMO. The high level model system is 
chorismate in the gas-phase calculated using B3LYP [STMS] (DFTTYP=B3LYP in SCONTRL) or MP2 
(MPLEVL=2 in SCONTRL) with either 6-31G(d) or the cc-pVDZ, cc-pVTZ and cc-pVQZ basis sets by 
Dunning [70] . 

We also carried out multilayer EFMO and FMO [7T] single-point calculations where region F is 
described by RHF/6-31G(d) and b and A (for EFMO) or B {B = AUb ioi FMO [H]) is calculated using 
MP2/6-31G(d). 

The FDD approximation in equation [TT1 is enabled by specifying M0DFD=3 in $FMO, similarly to 
the frozen domain approach in FMO [18) . All calculations had spherical contaminants removed from the 
basis set (ISPHER=1 in $CONTRL). 

Obtaining the Activation Enthalpy 

The activation enthalpy is obtained in two different ways by calculating averages of AI adiabatic reaction 
pathways. The starting points of the M pathways were randomly extracted from the MD simulation, 
followed by the reaction path mapping procedure described above for each pathway individually. One way 
to obtain the activation enthalpy averages the barriers from each individual adiabatic reaction path [72] 

1 

AhI = — V {ETs,^ - En,i) - 1.6kcalmor\ (17) 
M ^ — ' 

i=l 

Here M is the number of reaction paths {M = 7, Figure IH) i^TS,?: is the highest energy on the adiabatic 
reaction path while -Er^^ is the lowest energy with a negative reaction coordinate. 1.6 kcal mol~^ corrects 
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for the change in zero point energy and thermal contributions |72j . 
The other way of estimating the activation enthalpy is [3 7) : 

AH^ = (Ets) - (Er) - 1.6kcalmor^ (18) 

Here (Ets) a-nd (-Er) are, respectively, the highest energy and lowest energy with a negative reaction 
coordinate on the averaged adiabatic path (bold line in Figure (HI)- The brackets here mean averaging 
over 7 reaction paths; and the difference of Eqs [T7] and [15] arises because of the noncommutativity of 
the sum and the min/max operation over coordinates: in Eg 1171 we found a minimum and a maximum 
for each curve separately, and averaged the results, but in Eq[T5]wc first averaged and then found the 
extrema. As discussed below, the two reaction enthalpies are within 0.2 kcal/mol, which indicates that 
the TS occurs at roughly the same value of the reaction coordinate for most paths. 

Results and Discussion 

Effects of Methodology, Region Sizes and Approximations 

Reaction barriers obtained in the enzyme using harmonic constraints are plotted on Figure [5] and listed 
in Table [T] for different settings of region sizes and approximations. All calculated reaction barriers are 
within 0.5 kcal mol^^ from each other when going from the reactant {Rr) to the proposed transition 
(Rts) state where the reaction barriers for the TSs are around 46 kcal mol~^. The same is true when 
going to the product Rp. Only the large model (EFMO:L) shows a difference in energy near the product 
(i?p) with a lowering of the relative energy by 4 kcal mol~^ compared to the other settings. 

The reaction coordinates are also similar for the small systems (Rp = 1.41 A, except for i?rcsdim = 2.0 
which is Rp = 1.56 A) with some minor kinks on the energy surface from optimization of the structures 
without constraints at Rp. The EFMO:L model has a different reaction coordinate for the product 
{Rp — 1.57 A) and also a shifted reaction coordinate for the transition state Rts = —0.12 A which we 
can attribute to a better description of more separated pairs in the active region but more importantly 
that around the TS, the energy surface is very flat. Interestingly, using FM02 shows no significant change 
in either reaction barriers or reaction coordinates for the reactant, transition state or product which differ 
from EFMO:S by 0.02 A, 0.03 A and 0.01 A respectively. Timings are discussed below. 
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Previous work by Ranaghan et al. obtained an RHF barrier of 36.6 kcal mol^^ which is 10 

kcal/niol lower than what we obtained. Also, they observed that the transition state happened earlier 
at Rts ~ —0.3 A. The difference in reaction barrier from our findings is attributed to a poorer enzyme 
structure and other snapshots do yield similar or better reaction barriers (see below). Furthermore, the 
same study by Ranaghan ct al. found that the reaction is indeed exothermic with a reaction energy of 
around —30 kcal mol^^ at the RHF/6-31G(d) level of theory. We expect this difference from our results 
to arise from the fact the study by Ranaghan et al. used a fully flexible model for both the substrate 
and the enzyme where the entire protein is free to adjust contrary to our model where wc have chosen 
active fragments and atoms in a uniform sphere around a central fragment. This is perhaps not the best 
solution if one includes too few fragments (which lowers the computational cost) due to fragments in 
the buffer region are unable to move and cause steric clashes. The lowering of the energy for EFMO:L 
suggests this. 

Refined Reaction Energetics 

For the smallest EFMO:S system ONIOM results are presented on Figure [9] and in Table [2] for various 
levels of theory. By calculating the MP2/cc-pVDZ:EFMO-RHF/6-31G(d) energy using ONIOM we 
obtain a 19.8 kcal mol^^ potential energy barrier. Furthermore, the reaction energy is lowered from —1.3 
kcal mol^^ to —5.5 kcal mol^^. Increasing the basis set size through cc-pVTZ and cc-pVQZ reduces the 
barrier to 21.8 kcal mol^^ and 21.7 kcal mol^^, respectively and the reaction energy is -1.1 kcal mol^^ 
and 0.8 kcal mol~^. Using the smaller 6-31G(d) basis set with MP2, the reaction barrier is 22.2 kcal 
mol^^ and reaction energy is —3.2 kcal mol^^. The B3LYP results are improvements for the TS only 
reducing the barrier to 23.8 kcal mor^ for B3LYP/cc-pVDZ:EFMO-RHF/6-31G(d). The same is not 
true for the product where the energy is increased by about 3 kcal mol^^. For the other systems treated 
using EFMO-RIIF/6-31G(d) discussed in the previous section ONIOM corrected results at the MP2 or 
B3LYP level of theory using a cc-pVDZ basis set are listed in tables S2 to S5 and show differences from 
the above by less than 1 kcal mol~^, again the reaction coordinates changes slightly depending on the 
system. The effect of including correlation effects by means of MP2 and systematically larger basis sets is 
that the potential energy barrier for the reaction rises as more correlation effects are included, the same 
is true for the overall reaction energy. 

The results presented here for MP2 are in line with what has been observed previously by Ranaghan 
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et al. [37] and Claeysscns et al. [30] ■ Overall, the reaction barrier is reduced to roughly half of the RHF 
barrier and the observed coordinates for the reaction shift slightly. We do note that this study and the 
study by Ranaghan et al. use ONIOM style energy corrections for the correlation and not geometry 
optimizations done at a correlated level. Overall, we observe that the predicted reaction coordinate for 
the approximate transition state in the conversion of chorismate to prephenate happens around 0.2 A 
later than in those studies. 

The results for the multilayer single points along the energy surface are presented in Table [3l The 
barrier calculated at the EFMO-RHF:MP2/6-31G(d) level of theory is predicted to be 27.6 kcal mol"^ 
which is 5.4 kcal mol^^ higher than the ONIOM barrier and the reaction coordinates are shifted for both 
the reactant and the TS from Rr = -1.95 A to Rr = -1.64 A and Rts = -0.36 A to Rts = -0.11 
A. Similar resuhs are obtained at the FM02-RHF:MP2/6-31G(d) level of theory. The difference from 
the ONIOM corrected values in table |3] is likely due to the inclusion of dispersion effects between the 
chorismate and the enzyme which is apparently weaker at the transition state compared to the reactant 
state. 

Ensemble Averaging 

In Figure|6]and Figure[7]we show 7 adiabatic reaction paths mapped with EFMO-RHF/6-31G(d) starting 
from 7 MD snapshots; the energetics were refined with ONIOM at the MP2/cc-pVDZ and MP2/cc-pVTZ 
level. In EFMO, we used a small active region (EFMO:S) and -Rrcsdim = 1-5 and no dimer calculations 
in region h (S15FD3 in Figure [SJ. Out of the 7 trajectories one is described in detail in the previous 
sub-section. 

For MP2/cc-pVDZ:EFMO-RHF/6-31G(d) the reaction enthalpies are Aij| = 18.3 ± 3.5 kcal mor^ 
and Aij| = 18.2 kcal mol^^ [cf. Equations ([T7)) and ([T5|) ]. the latter having an uncertainty of the mean of 
6.9 kcal mor^. For MP2/cc-pVTZ:EFMO-RHF/6-31G(d) the reaction enthalpies are = 19.3 ± 3.7 
kcal mol^^ and AiJ^ = 18.8 kcal mol^^ with an uncertainty of the mean of 7.1 kcal mol^^. These barriers 
are ca 5.5 (6.5) kcal mol^^ higher than the experimental value of 12.7±0.4 kcal mol^^ for MP2/cc-pVDZ 
(MP2/cc-pVTZ). For comparison, the activation enthalpy obtained by Claeyssens et al. [401172] (9.7 ± 1.8 
kcal mol~^) is underestimated by 3.0 kcal mol^^. 

There are several differences between our study and that of Claeyssens et al. that could lead to an 
overestimation of the barrier height: biasing the MD towards the TS rather than the reactant, a larger 
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enzyme model (7218 vs 2398 atoms), and more eonformational freedom when computing the potential 
energy profile. 

With regard to the latter point, while Figure [5] shows that inereasing the active region has a relatively 
small effect on the barrier this may not be the ease for all snapshots. We did identify one trajectory 
that failed to produce a meaningful reaction path and is presented in Figure SI. Here, the energy of the 
barrier becomes unrealistically high because of very little flexibility in the active site and unfortunate 
placement of Phe57 (located in the buffer region. Figure S2), which hinders the conformational change 
needed for the successful conversion to prephenate yielding an overall reaction energy of around +11 kcal 
mol~^. As noted above, the EFMO:L settings is a possible solution to this as more of the protein in 
available to move, but as seen from Table [T] the computational cost doubles. 

Timings 

Using the computationally most efficient method tested here (EFMO:S), i?rcsdim = 1-5, and skipping 
dimers in the buffer region 6, an adiabatic reaction path, which requires a total of 467 gradient evaluations, 
can be computed in four days using 80 CPU cores (20 nodes with 4 cores each) at the RHF/6-31G(d) 
level of theory. As shown in Table [1] the same calculation using FM02 requires takes roughly T/"/' = 7.5 
times longer. 

Increasing i?rcsdiin from 1.5 to 2 has a relatively minor effect of the CPU time (a factor of 1.2), while 
performing the dimer calculations in the buffer region nearly doubles (1.7) the CPU time. Similarly, 
increasing the size of active region from 2.0 A to 3.0 A around chorismate also nearly doubles (1.8) the 
CPU time. This is mostly due to the fact that more dimer calculations must be computed, but the 
optimizations also require more steps (513 gradient evaluations) to converge due to the larger number of 
degrees of freedom that must be optimized. 

Looking at a single minimization for a specific reaction coordinate R = —1.79 A, the most efficient 
method takes 4.5 hours. Here, the relative timings Trd are all larger than for the full run (T/"/') due to 
a slight increase in the number of geometry steps (around 25) taken for all but FM02 which is identical 
to the reference (22 steps). Thus, the overall cost of performing the FM02 minimization is 6.7 times as 
expensive as EFMO. 
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Summary and Outlook 

In this paper we have shown that the effective fragment molecular orbital (EFMO) method |20(l2T| can be 
used to efficiently map out enzymatic reaction paths provided the geometry of a large part of the enzyme 
and solvent is frozen. In EFMO one defines an active region associated with the active site, and the cost 
of a geometry optimization is then essentially the cost of running quantum-mechanical calculations of 
the active domain. This is similar to the cost of QM/MM, if the QM region is the same; the difference is 
that in EFMO wc freeze the coordinates of the rest of the system, whereas in QM/MM they are usually 
fully relaxed. On the other hand, EFMO does not require parameters and can be better considered an 
approximation to a full QM calculation rather than a QM/MM approach. 

In this work we used the mapping technique based on running a classical MD simulation, selecting some 
trajectories, freezing the coordinates of the outside region, and doing constrained geometry optimizations 
along a chosen reaction coordinate. An alternative to this approach is to run full MD simulation of a 
chemical reaction using EFMO. This has already been done for many chemical reactions using FMO- 
MD [75H75] and can be done in future with EFMO. 

A potential energy profile for the chorismatc to prcphenate reaction in chorismatc has been computed 
in 4 days using 80 CPU cores for an RIIF/6-31G(d) description of a truncated model of the enzyme con- 
taining 2398 atoms. For comparison, a corresponding FM02 calculation takes about 7.5 times more. The 
cost of EFMO calculations is mainly determined by the size of the buffer- and active region. Comparing 
to a QM/MM with QM region of the same size, EFMO as a nearly linear scaling method, becomes faster 
than QM if the system size is sufficiently large; especially for correlated methods like MP2 where this 
cross-over should happen with relatively small sizes. 

Our computed conformationally-averaged activation enthalpy is in reasonable agreement to the ex- 
perimental value, although overestimated by 5.5 kcal/mol. 

The energetics of this reaction depends on the level of calculation. We have shown that by using a 
level better than RHF, for instance, MP2 or DFT, considerably improves the energetics and by using 
such an appropriate level to also determine the reaction path following the formalism in this work can 
be used to provide a general and reliable way in future. 

EFMO, as one of the fragment-based methods [13], can be expected to be useful in various biochemical 
studies, such as in enzymatic catalysis and protcin-ligand binding. It should be noted that in addition to 
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its paramatcr-frcc ab initio based nature, EFMO and FMO also offer chemical insight on the processes 
by providing subsystem information, such as the properties of individual fragments (e.g., the polarization 
energy) as well as the pair interaction energies between fragments |76[I77) . This can be of considerable 
use to fragment-based drug discovery [78 1 179 ] . 
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Figure 1. Conversion of chorismate to prephenate through its transition state. Atoms of 
interest are marked with numbers one trough four. 




Figure 2. Definition of a system with active, buffer and frozen regions in frozen domain 
EFMO. 
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Figure 3. Cross region fragmentation. The fragmentation procedure shares an atom (here and 
is the shared atom) between two neighboring and covalently bonded fragments. Even though these 
fragments are in separate regions, they still share an atom across that region as illustrated. 



24 




Figure 4. EFMO:S model of chorismate mutase used in this study. The entire model contains 
2398 atoms. There are 1341 atoms in green belonging to the frozen region (F), 928 atoms in blue 
belonging to the buffer region (6) and 129 atoms in red belonging to the active region (A). 




Figure 5. EFMO:L model of chorismate mutase used in this study. The entire model contains 
2398 atoms. There are 1006 atoms in green belonging to the frozen region (F), 1151 atoms in blue 
belonging to the buffer region (6) and 241 atoms in red belonging to the active region {A). 



26 



Averaged Reaction Barrier for Chorismate jviutase 
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Figure 6. Reaction Enthalpy Profile for chorismate mutase. The 7 profiles are calculated with 
ONIOM at the MP2/cc-pVDZ:EFMO-RHF/6-31G(d) level of theory. The black line is the average 
reaction energy, grey lines are individual reaction paths. The blue line is the reaction path discussed in 
detail, in the results section. 
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Averaged Reaction Barrier for Chorismate jviutase 
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Figure 7. Reaction Enthalpy Profile for chorismate mutase. The 7 profiles are calculated with 
ONIOM at the MP2/cc-pVTZ:EFMO-RHF/6-31G(d) level of theory. The black line is the average 
reaction energy, grey lines are individual reaction paths. The blue line is the reaction path discussed in 
detail, in the results section. 
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Calculated reaction barriers for Chorismate Mutase 

60 1 I , , , , , 
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Figure 8. EFMO-RHF/6-31G(d) barrier for chorismate mutase. S15FD3 and S15FD3_FMO 
arc EFMO:S and FMO:S, respectively, both with i?rcsdim = 1-5, and the dimer approximation in region 
b (Equation [TT]) . S15FD1 is similar to S15FD3 but without the dimer approximation in region b. 
S20FD3 is also similar to S15FD3 but with i^iosdim = 2.0, instead. Finally, L15FD3 is EFMO:L with 
^rcsdim = 1-5, and the dimer approximation (FDD) in region b. 
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Calculated reaction barriers for Chorismate Mutase 
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Figure 9. ONIOM results calculated with various levels of theory for EFMO:S geometries. 
The red curve is the EFMO-RHF/6-31G(d) result also presented in Figured Blue (B3LYP) and green 
(MP2) curves are ONIOM results with chorismate calculated in the gas-phase using the 6-31G(d) (solid 
lines), cc-pVDZ (dashed lines) or cc-pVTZ (dotted line) basis set. 
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2 Tables 

Table 1. EFMO-RHF and FM02-RHF results for chorismate mutase. 



Model 


Rresdim 


modfd 


Rr 


Rts 


Rp 


Ets-r 


Ep^R 


Tic\ 




EFMO:S 


1.5 


3 


-1.95 


-0.36 


1.42 


46.25 


-1.32 


1.0 


1.0 


EFMO:S 


1.5 


1 


-1.96 


-0.36 


1.42 


46.49 


-1.34 


2.0 


1.7 


EFMO:S 


2.0 


3 


-1.96 


-0.12 


1.56 


46.46 


-0.91 


1.3 


1.2 


EFMO:L 


1.5 


3 


-1.97 


-0.35 


1.57 


46.42 


-3.61 


2.1 


1.8 


FM02:S 


1.5 


3 


-1.93 


-0.33 


1.41 


47.21 


-1.40 


6.7 


7.5 



Reaction barriers of chorismate mutase calculated with different levels of theory. Rresdim is unitless. 
The reaction coordinates for the reactant, transition state and product are i?R, i?TS a-nd i?p, 
respectively and given in A, barrier height of the transition state i?TS-R and overall reaction energy 
Sp_R in kcal/mol. T^^x are relative timings to EFMO-RHF/6-31G(d) using the EFMO:S model with 
the fully minimized reaction coordinate on the trajectory subject to harmonic constraints, r^f"/' are for 
the entire path. 
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Table 2. Reaction barriers of chorismate mutase calculated using ONIOM. 





Rr 


Rts 


Rp 


Ets-r 


Ep-R 


MP2/6-31G(d) 


-1.83 


0.13 


1.56 


22.24 


-3.20 


MP2/cc-pVDZ 


-1.83 


-0.36 


1.56 


19.75 


-5.48 


MP2/cc-pVTZ 


-1.83 


0.13 


1.56 


21.79 


-1.14 


MP2/cc-pVQZ 


-1.83 


0.13 


1.56 


21.68 


-0.82 


B3LYP/6-31G(d) 


-1.83 


0.13 


1.39 


25.19 


3.81 


B3LYP/cc-pVDZ 


-1.83 


0.13 


1.39 


23.81 


2.58 


B3LYP/cc-pVTZ 


-1.83 


0.13 


1.56 


24.62 


4.36 


B3LYP/cc-pVQZ 


-1.83 


0.13 


1.56 


24.66 


4.16 



The reaction coordinates in A for the reactant, transition state and product are Rr, Rts and Rp, 
respectively. The barrier height of the transition state i?TS-R and the overall reaction energy i?p_R are 
in kcal/mol. 
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Table 3. Reaction barriers of chorismate mutase calculated using multilayer EFMO and 
FM02 calculations. 

Rr Rts Rp Ets~r Ep^ji 

EFMO-RHF:MP2/6-31G(d) -1.64 -0.11 1.39 27.64 -4.70 
FM02-RHF:MP2/6-31G(d) -1.64 -0.11 1.88 29.22 -6.41 

The reaction coordinates in A for the reactant, transition state and product are Rn, Rts and Rp, 
respectively. The barrier height of the transition state i?TS-R and the overall reaction energy i?p_R are 
in kcal/mol. 
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Supporting Information 

Figure SI. Reaction barrier calculated at the MP2/cc-pVDZ:EFMO-RHF/6-31G(d) level of theory for 
EFMO:S using i?iosdim = 1-5 and FDD (modfd=3). This snapshot shows the effect of not having 
enough flexibility in the active region around the substrate. 

Figure S2. Two different starting geometries with chorismate and Phe57 shown as sticks from the MD 
simulation. A) shows a configuration which results in a successful reaction path and B) a configuration 
which results in an unsuccessfuU reaction path (see Figure SI). The position of Phe57 coupled with a 
placement in the buffer region (6) makes it unable to move to accomodate the conversion of chorismate 
to prephenate. 

Table SI. Complete listing of all residues in the protein model (PDB: 2CHT) along with their 
protonation state after being protonated using the PDB2PQR tool. 

Table S2. Reaction barriers of chorismate mutase calculated using ONIOM for EFMO:S using 
-Rrosdim = 1.5 and FD (modfd=l). 

Table S3. Reaction barriers of chorismate mutase calculated using ONIOM for EFMO:S using 
i?,g,dim = 2.0 and FDD (modfd=3). 

Table S4. Reaction barriers of chorismate mutase calculated using ONIOM for EFMO:L using 
i?rosdim = 1.5 and FDD (modfd=3). 

Table S5. Reaction barriers of chorismate mutase calculated using ONIOM for FM02:S using 
-Rrosdim = 1.5 and FDD (modfd=3). 



